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ABSTRACT: In this paper, we investigate an AM-FM 
model for representing modulations in speech resonances. Specif- 
ically, we propose a frequency modulation (FM) model for the 
time- varying formants whose amplitude varies as the envelope 
of an amplitude-modulated (AM) signal. To detect the modu- 
lations we apply the energy operator = (i) 2 — xx and its 
discrete counterpart. We found that ’F can approximately track 
the envelope of AM signals, the instantaneous frequency of FM 
signals, and the product of these two functions in the general 
case of AM-FM signals. Several experiments are reported on the 
application of this AM-FM modeling to speech signals, bandpass 
filtered via Gabor filtering. 

1 Introduction 

In his work on nonlinear modeling of speech production, Teager 
[1, 2] used the nonlinear operator 

$ ti[a:(n)] = x 2 (n) - $(n — 'l)x{n -f- 1) (1) 

on speech-related discrete-time signals ar(n). Kaiser [3] analyzed 
^4 and showed that it can detect the frequency of single sinusoids 
and chirp signals, and has many useful properties; e.g., 

4>d[Ar n cos(fio Ti + 0)] = A 2 r 2n sin 2 (fl 0 ) (2) 

Kaiser [4] recently introduced an operator closely related (see 
Section 3) to for continuous-time signals 'x(t): 

yMt)} = m} 2 -xmt) ( 3 ) 

where x — dx/dt , and investigated several properties of ^ c ; e.g., 

$ c [Ae r * cos(u>ot + 0)] = A 2 e 2r -o;o (4) 

®«(*(Uy(0] = * , (0® e [y(0i + »'(0*e : [?(‘)} (5) 

* c was originally derived to track the energy of a linear un- 
damped oscillator. Namely, when is applied to the oscillation 
signal A cos(u>o 0 » lis output is (Au; 0 ) 2 and hence proportional to 
the energy of the source producing the oscillation. Thus can 
be viewed as an energy operator. 

Teager applied to signals resulting from bandpass filter- 
ing speech vowels in the vicinity of their formants. If the formant 
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were due to a linear resonance, then by (4) the operator’s output 
would be a decaying exponential. Teager observed, on the other 
hand, several “energy” pulses per pitch period, which he viewed 
as indicating modulation of formants caused by nonlinear phe- 
nomena such as rapidly varying separated air flow in the vocal 
tract. In our work we interpret these energy pulses by using an 
AM-FM model , i.e., a frequency modulation (FM) model for the 
variation of the center frequency of time- varying formants whose 
amplitude varies like the envelope of an amplitude-modulated 
(AM) signal. For pure AM and FM signals we found that the en- 
ergy operators and Vd can approximately track the envelope 
of AM signals, the instantaneous frequency of FM signals, and 
the product of these two functions in the case of AM-FM signals. 
Our coverage of these interesting results is brief and focuses on a 
few special cases; more details and general cases are given in [5]. 
We have also obtained promising experimental results from ap- 
plying this AM-FM modeling to speech signals, bandpass filtered 
via Gabor filtering. 

2 Continuous- time AM and FM 

Consider a general AM signal 

X A M(t) = e(t)cos(u c t + 0) (6) 

where e(t) is an envelope more slowly varying than the carrier. 
Henceforth, we shall drop the subscripts c,d from since it 
will be clear from the context whether we refer to continuous or 
discrete time. By (5) and (4), 

^[XamW] = e 2 u> 2 + cos 2 (a; c t + 0) , F(e) = [u; c e(/)] 2 (l + error) 

* [w c e(t)] 2 , if ¥(e) < (w c e) 2 

(7) 

where « is meant as “approximated by the dominant term”. The 
order of the approximation error in (6) (where by order we mean 
the order of maximum value that a signal or a quantity can as- 
sume) is 0[<F(e)/(ew c ) 2 ]. In [5] it was shown that this error 
order is < 1 if e(t) is any bandlimited signal whose highest fre- 
quency uj a is < u c . Then acts as an envelope detector , because 
\/¥[e(t) cos(w c t 0)] oc |e(i)|. Two special cases for e(t) are: (i) 
(AM with carrier) e(t) = 1 + ma(l), where a(t) is the AM infor- 
mation signal and m < 1 is the modulation index, (ii) (AM with 
suppressed carrier) e(t) = a(t). For simplicity let e(t) = cos (u) a t) 
with tJ a <C w c (as standard to assume in AM); then 

4>[cos(u > a t) cos(u ) c t + 0)] w[w c cbs (u> a t)} 2 (8) 
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The approximation error in (8) has order 0[(u; a /u> c ) 2 ]; e.g., it is 
< 10% if u a /u c < 1/3, or < 1% if w a /w c < 0.1. 

Consider now the general FM signal 

XfmW = cos[$i)] = cos(a; c i + A f f(r)dr + 6) (9) 

Jo 

where <f>(t) is the instantaneous phase, u> c is the carrier frequency, 
Ui(t) — d<j>/dt = u> c + Af(t) is the instantaneous frequency, f(t) is 
the FM information signal and varies more slowly than cos(u> c t), 
and A is the maximum frequency deviation. It is assumed that 
1/(01 < 1 and A < u c . Applying to Xfm yields 

*[XfmC01 = (<t>? + = (<A) 2 (1 + error) , 

* (*)* = MOP , if t « 2 W) 2 V ' 


The error in (10) has order 0[<j>/ 2(<^>) 2 ]; if this is < 1, then 
tracks well the instantaneous FM frequency. For the special case 
of an FM-chirp signal, f(t) = st and u>,(<) = oj c + A st; then the 
error order is < 1 if As < a; 2 . For bandlimited signals / with 
highest frequency uj the error order is -C 1 if A uj <C 2 (u> c ) 2 [5]. 
Such a special case is the FM-sine signal, where /(f) = cos(u?/f), 
{ 3 = Ajujj is the FM modulation index, and 

^[cos(u; c f 4- /3sm(uft) + 0)] « w?(<) = [u c + A cos(u;yi)] 2 (11) 


The error has order < 1 if either u>j < lj c (a standard assumption 
in FM), or A < u c (small frequency deviation), or both. For 
example, the error will be < 10% if (A/w c ) • (u)j/uj c ) < 1/5. 

Consider now a general AM-FM signal, i.e., an FM signal 
whose amplitude is the envelope e(f) of an AM signal: 

XafmW = e(f)cos[^>(f)] = e(/)cos(w c f + A [ /(r)dr + 0) (12) 

Jo 

Then, by (5) and (10), 


*[X AFM {t)} = + isgil] + «.*(*)*(«) 

= [e(/)u;i(f)] 2 (l + error) 

» [e{t)u>i(t)\ 2 , if 0<2(<£) 2 and$(e)<(e<£) 2 

(13) 

The approximation error in (13) is of the order 


O(error) = max 


4> 

W 


, 0 


iwn 


(14) 


This error is < 1 if e(f),/(f) are bandlimited signals whose 
highest frequency is < u) c [5]. Then \/¥[e(f) cos(/u;,(r)dr)] « 
|e(f)|u?i(f) and thus the v^’s output is the product of two parts: 
an FM instantaneous frequency u>,(f) and the AM envelope |e(<)|. 
This result generalizes the tracking ability of which for 

Acos(ct> 0 f) signals yields A<jj$, whereas for AM-FM signals the 
constant amplitude A and frequency u>o are replaced by the AM 
envelope and the FM instantaneous frequency. 

As a special case of the AM-FM model, let e(t) = cos (u> a t) 
and J(t) = cos(u)jt). Then 


V^[cos(u; a f) cos(u; c f+/?sin(u;/f)+0)] w | cos(w a f)|[w c +A cos(w/<)] 

(15) 

The error will be < lif(i)A <u c or lj c and (ii) oj a < w c . 
If A <C w c , then the ^(f) variations have much smaller amplitude 
than that of cos(u; a f), and AM dominates over FM, i.e., the 
output follows the AM envelope signal. AM also wins over FM 
if w/ < 2 it I L < u> a where L is the time duration of the analysis 


window. If uj > / L > w 0 , then FM wins over AM, i.e., 

tracks the FM instantaneous frequency. 

A by-product of the AM-FM model is a better algorithm for 
FM detection. Taking the derivative of the FM signal cos <£(*), 
gives an AM-FM signal y(t) = — <j>(t) sin <f>(t) whose envelope 
is the FM instantaneous frequency. Hence, applying (13) with 
e = 4> gives v/’F^t)] « [wj(t)] 2 . Thus, we can build an FM de- 
tector from a differentiator followed by the ^ operator. By (10), 
a/'F applied to an FM signal tracks its instantaneous frequency 
w;(2), but if >/¥ is instead applied to the FM derivative, then 
it tracks [cj.-(t)] 2 . Hence, the latter case will give a better FM 
tracking, since the square law will emphasize the oscillations of 
the instantaneous frequency. 

Although all the results in Section 2 (for notational simplic- 
ity) referred to unit-amplitude cosines with no exponential de- 
cay, they can be easily extended to incorporate an amplitude 
A and/or an exponential decay e rt in the input signal by just 
multiplying the energy operator’s output with A 2 e 2rt , because 
^[/le rl a:(t)] = A 2 e 2rt ^[a:(t)]. 

3 Discrete-time AM-FM 

By discretizing derivatives we can obtain from $ c an expression 
closely related to Vd and thus link the two operators. We exam- 
ined several cases, e.g., the 2-sample backward difference 

x(t ) ' — ► x(n) - x(n - 1) => $ c [x(t)] t-y $^[ 2(71 - 1)] (16) 

Likewise, the 2-sample forward difference x —* x(n + 1) — x(n) 
gives ^^[x(n + 1)]. Thus both asymmetric 2-sample differences 
succeed to transform into (modulo one sample shift). How- 
ever, 2-sample or 3-sample symmetric differences fail because 
they give more complicated expressions [5], Next we apply iPj to 
a few cases of discrete AM and FM signals. 

Let X(n) = cos(fi a n) cos(n c n -f 0) be an AM signal. From 
the general property 

5 ' d [a:(n)j/(n)] = x 2 (n)^j[z/(n)]+^ 2 (n) , F d [2:(n)]- , i f [ i[2:(n)]^ ci ji/(n)] 

(17) 

and by (2) it follows that $[Jf(n)] is equal to cos 2 (Q a n) sin 2 (fl c ) 
+ [cos 2 (n c rc + 0) - sin 2 (ft c )] sin 2 (ft a ). Hence, 

^[cos(n a n) cos(n c n + 0)] ~ (sin(f2 c ) cos(fi a n)] 2 (18) 

if sin 2 (fl a ) < tan 2 (fi c ) (which holds if < H c ). 

Consider the discrete-time FM-sine signal 

Y (n) = cos[<^>(n)] = cos[H c n -f (3 sin(fi /n) + 9\ (19) 

where (3 = A/Qy, and the instantaneous frequency is fli(n) = 
d<j>(n)/dn = Q c + A cos(fi jn). For applying ^ to Y ( n ) note that if 
A = Q c n+/3 cos(Q/) sin(Q/n)-j-0 and B — £l c +(3 sin(lfy) cos(fi/n), 
then Y (n + l)Y (n-1) = (cos2A + cos2B)/2 = cos 2 ( A)-sin 2 (B). 
If Hy is sufficiently small such that cos(ny) % 1 and sin(ft/) % 
fi/, then cos(A) ss F(n), B « H t (n), and by (1) 

^[cos(^(n))J rs sin 2 [fi c + A cos(flyn)] (20) 

All the results in Section 3 can be easily extended to incor- 
porate an amplitude A ^ 1 and/or an exponential decay r n in 
the input signal by just multiplying the output of ^ by A 2 r 2n . 
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We conducted many experiments on synthetic discrete AM 
and FM signals that verified the theoretical results and validated 
the approximate formulas. Figure 1 illustrates some of the above 
conclusions. 

4 Modeling Speech Resonances 

Teager’s work in steady-state vowels provides evidence for res- 
onances with time-varying formants and amplitudes [2], In our 
work we model a single speech resonance by a damped AM-FM 
model 

R(n) = Ar n cos (f2 a n) cos(f2 c n -f /?sin(f2yn) -f 0) (21) 

where Q c is the center frequency of the formant, the instanta- 
neous FM frequency n,(n) = f2 c + A cos(f2/n) models the time- 
varying formant, and A = f3Q.j controls the amount of FM. The 
AM envelope | cos(f2 a n)| tracks the amplitude variations and r is 
the rate of energy dissipation. Then by (18) and (20) 

V^[/2(n)] ps [ Ar n cos(H a n) sin(n c + A cos(ft/n))[ (22) 

This approximation assumes that f 2/ is small and that f 2 a < 
Q c (see the discrete AM and FM case). Thus, >/^[i2(n)] is a 
(damped) product of the envelope and the (sine of the) instan- 
taneous frequency of the resonance. This class of signals may 
serve as a model of energy pulses observed on actual speech 
waveforms. An alternative model for the FM frequency that 
is consistent with our previous analysis is the chirp ft t (7i) = 
f2 c + Asn. In addition, an alternative envelope model would 
be e(n) = 1 + m cos(fl a u) where m measures the amount of AM; 
then \/¥[e(n) cos(J f2,(n))] « e(n)| sin 12,(n)| if 12/ is small and 
12 fl < 12 c or m < 1 [5). 

In our work we extract a single resonance by bandpass fil- 
tering the speech signal with a Gabor filter, whose impulse and 
frequency response are 

h(t) = exp (-a 2 ! 2 ) • cos(u c t) (23) 

(24, 

The Gaussian shape of H(lj) avoids producing side lobes that 
could produce false pulses in the #Y output. The bandwidth 
(measured between the points at 10% of peak value) is about 
BW — 1.7a (in Hz). Our design of the discrete bandpass Ga- 
bor filter proceeds as follows: A center formant frequency F c 
is manually selected from the short-time speech spectrum by 
visual inspection. A value of a is selected such that BW 6 
[0.5 F c , F c ]. h(t) is discretized by replacing t with nT, where T is 
sampling period, and truncating h(n) to a symmetric FIR filter, 
k (n) = exp(-5 2 n 2 ) • cos(n c n), with -N < n < N, b - aT , and 
U c — 2 *F C T. Then the Gabor bandpass filtering is performed 
by convolving the truncated h(n) with the speech signal. The 
integer N is chosen to truncate the Gaussian envelope of h(n) es- 
sentially to zero; e.g., N = 2.5 /(aT) yields exp(-6 2 iV 2 ) = 0.002. 

Fig. 2 shows (a) a segment of a speech vowel /e/ sampled 
at F 3 = 30 kHz and (b) the output from when applied to 
a bandpass filtered version of (a) extracted around a formant at 
F c ~ 3400 Hz using a Gabor filter with b = 100GT and N - 75. 
Fig. 3 is similar to Fig. 2 but for a sustained vowel /a/ sampled 
at 10 kHz with F c = 2620 Hz, b = 1500T, and N — 50. There 
are present 2-3 pulses per pitch period, and the damped AM- 


FM model (22) may approximately explain the shape of these 
measured energy pulses. There have been cases where we have 
observed only one major pulse per pitch period. This may be 
partially explained by a low percent of (AM or FM) modulation 
or by small S2 a , £2/. 

Equation (22) has generally both an AM and an FM com- 
ponent. The pulses could be due to both or just one of them. 
If A < 12 c (or if 12/ < 2xFq < 12 a where Fq is the pitch fre- 
quency), then AM wins over FM and \/^ follows essentially the 
envelope of the resonance; such a case is illustrated in Fig. lc 
via a synthetic AM-FM signal. If however S7 a < 2xFq < 12/ (or 
if m < 1 in the case of the 1 + m cos(12 a n) envelope), then the 
FM dominates over AM. By testing ideas on synthetic AM-FM 
signals and by running zero-crossing FM detectors on speech res- 
onances, we have seen in our speech experiments that AM tends 
to dominate FM. Also if the FM frequency deviation A is small, 
then the exponential decay makes the FM component harder to 
detect. Thus additional effort is required to isolate and purify 
the FM component. 

Finally, note that we have assumed in all the above analysis 
and experiments the presence of a single resonance in the vocal 
tract output. Actual speech vowels are quasi-periodic and may 
consist of multiple resonances. Both these phenomena introduce 
an additive component to the single resonance which may alter 
the output of the energy operator. Consider the case that arises 
when there are two formants closely spaced. Let’s model this 
situation with the signal x(t) = sin(u>it + 20) + sin(u> 2 t). Then 
z(t) = 2 cos(o; a t-F0) sin(a/ c f-f0) is an AM signal whose carrier and 
envelope frequencies are u? c = (uq F u> 2)/2 and w a = (u;i - lj 2 )/ 2 . 
By(7),V^[*(t)]«w c | cos(u a t + 0)|, and hence \/$ will track the 
envelope, if the approximation error order 0[(1 - d) 2 /( 1 -f d) 2 ] 
is < 1, where d = < 1 (assume > u; 2 ). For this er- 

ror to be < 10%, d > 0.5, i.e., the two formants must be less 
than an octave apart. Then we observe an AM modulation of 
one formant by the other. Consider also the case of two consecu- 
tive pitch harmonics falling within the resonance bandwidth and 
passing through the Gabor filter. Then the above model (con- 
sisting of two additive sines) holds and may predict a possible 
tracking of an AM envelope. However, this AM envelope varies 
with a frequency roughly equal to the pitch frequency and thus 
the modulation does not introduce additional pulses over a pitch 
period. Finally, in the time-domain, closely spaced responses 
from the vocal tract due to consecutive pitch pulses may also in- 
troduce fluctuations in output which are not consequences 
of the AM or FM modulation of the resonance itself. 


Our discussion motivates the following model for the vocal tract 
response (over a pitch period): 

K 

S ( n ) ~ E cos(f2 0l *ra)cos('f2 Cl * + ^ sin(f2/ i<t «) + 0 k ) (25) 
k- i 

where K is the number of resonances. We also may include in- 
teraction between resonances by allowing coupling between the 
fi a ’s and the f2/’s of the same resonance or among different reso- 
nances. It is important to emphasize that this is not an AM-FM 
model of speech production, but rather the AM-FM is a math- 
ematical vehicle to model the acoustical consequences of some 
nonlinear mechanisms of speech production. One approach to 


5 Conclusions 
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obtaining the parameters of such a model would be to 1) find the 
K center frequencies D c (e.g., from the short-time Fourier trans- 
form), 2) iteratively extract each frequency band and model it as 
an AM-FM signal by using the operator 3) subtract the mod- 
eled AM-FM component from the total speech signal and model 
the remainder of the resonances. 
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Figure 1. AM-FM signals X(n) = r n cos(D a n)i 
\/lF[X(n)], where D c = 0.2?r, r = 0.004: (a) D a 

n a = o.oijt, d j = o.o5tt, a = d c /io. 
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Figure 2. (a) Original speech vowel /e/. (b) Output of energy 
operator y/ty when applied to a resonance of (a). 


■ [D c n + (A/D/) sin(D/n)] and the energy operator outputs 
O.OItt, D/ = 0. (b) D a = 0, D/ = 0.02 tt, A = D c . (c) 
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Figure 3. (a) Sustained speech vowel /a/, (b) Output of energy 
operator when applied to a resonance of (a). 
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